KAFKA-6145: Pt 1. Bump protocol version and encode task lag map #8121

ableegoldman · 2020-02-15T01:58:49Z

"First" PR for KIP-441: implement the protocol change so we can encode the task lag info in the subscription

ableegoldman · 2020-02-19T04:45:08Z

Call for review @cadonna @vvcephei cc/ @guozhangwang

ableegoldman · 2020-02-19T04:46:39Z

...ams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignor.java

-            topics,
-            standbyTasks,
-            rebalanceProtocol);
+        // 2. Map from task id to its overall lag


This plus the tech debt cleanup allows for the subscription handling to be greatly simplified, here and below in #assign

ableegoldman · 2020-02-19T07:18:25Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

    /**
     * Returns ids of tasks whose states are kept on the local storage. This includes active, standby, and previously
     * assigned but not yet cleaned up tasks
     */
-    public Set<TaskId> tasksOnLocalStorage() {
+    Set<TaskId> tasksOnLocalStorage() {


This was only ever used to encode the subscription info, which is now all handled by getTaskLags

This method could actually become private, except for a single test. I'm wondering if we can port that test to use getTaskLags instead. Aside from letting us make this private, that would probably improve our testing coverage, since I suppose that test was intended to unit test this class, meaning it should be testing this class's public API, not internal methods.

This method is getting refactored somewhat heavily in the next PR, where we actually collect the offsets and will lay in some heavy testing. Still trying to strike a balance between not writing/fixing up any tests until the end, and spending time on 50 tests per PR that are all rendered useless as soon as I start the next one.
But, I will tighten up this test

cadonna

@ableegoldman Thank you for the PR!

Here my feedback:

cadonna · 2020-02-21T08:37:10Z

streams/src/main/resources/common/message/SubscriptionInfo.json

@@ -65,6 +70,27 @@
          "type": "int32"
        }
      ]
+    },
+    {
+      "name": "TaskLagPair",


req: I understand why you called this a pair. However, it seems odd that this pair consists of three fields. Could you call it TaskLagTriple?

Ack, will fix this if it turns out nested structs aren't possible

streams/src/main/resources/common/message/SubscriptionInfo.json

cadonna · 2020-02-21T08:58:20Z

streams/src/main/resources/common/message/SubscriptionInfo.json

+        {
+          "name": "topicGroupId",
+          "versions": "1+",
+          "type": "int32"
+        },
+        {
+          "name": "partition",
+          "versions": "1+",
+          "type": "int32"
+        },


Q: Is it not possible to use the struct TaskId here? If not, the versions field of TaskId (and of all nested fields?) should be set to 1-6, shouldn't they? TaskId is only used for the fields that are removed in version 7.

Working on finding someone who knows the code generation code to clear things up here (whether we can have nested structs, and what is the correct version for structs/inner fields)

Alright so the good/bad news is that this only works if the nested struct is an array (ie can do type: []TaskId but not type: TaskId). There's no reason for this to be the case, besides "didn't have the time and need to implement it" so we could in theory just add this ability.

I haven't looked into the code really so I'm not sure how much time that might take, so, I'm deciding to just encode as basic types for now. It's only two fields after all

.../src/main/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfo.java

...ams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignor.java

.../test/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfoTest.java

streams/src/test/java/org/apache/kafka/streams/tests/StreamsUpgradeTest.java

streams/src/main/resources/common/message/SubscriptionInfo.json

vvcephei

Hey @ableegoldman , I did a partial pass. I'll have to review the rest later.

vvcephei · 2020-02-28T23:21:39Z

...ams/src/main/java/org/apache/kafka/streams/processor/internals/StreamsPartitionAssignor.java

-            if (!state.ownedPartitions().isEmpty()) {
+            // this is an optimization: we can't decode the future subscription info's prev tasks, but we can figure
+            // them out from the encoded ownedPartitions
+            if (uuid == futureId && !state.ownedPartitions().isEmpty()) {


Just to be clear, does this mean we're certain that for non-future members (current or older-versioned ones), the encoded "prevTasks" actually contains all the previous tasks?

I gather this is true from the SubscriptionInfo protcol:

{ "name": "prevTasks", "versions": "1-6", "type": "[]TaskId" }

But then, I'm a little mystified by the prior comment... why would "active tasks" not have been encoded with the cooperative protocol?

Let me give a quick history lesson to clarify:
By adding the ownedPartitions field to the subscription we were able to remove the encoded prevTasks and avoid duplicating info we could get from the new field for COOPERATIVE members. Note the ownedPartitions are the source of truth and may differ from the prevTasks that would have been encoded in edge cases (eg topic deletion, partitions lost)
In hindsight, aka a few weeks back when I started looking at the assignor code again, I realized this was just unnecessary and likely to lead to more trouble than it solves. So, now we just encode all tasks in the offset map regardless of rebalance protocol.
Ok, while explaining that I now realize we obviously still need to fill in the prevTasks from the ownedPartitions for members on 2.5/2.6 -- thanks Socrates :P

vvcephei · 2020-02-28T23:31:19Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+            if (isActive(id)) {
+                taskLags.put(id, ACTIVE_TASK_SENTINEL_LAG);
+            } else {
+                taskLags.put(id, 0);


Suggested change

taskLags.put(id, 0);

taskLags.put(id, STANDBY_TASK_SENTINEL_LAG);

Either that, or my preference would actually be to inline both sentinel lags (with a comment explaining why that choice of sentinels).

vvcephei · 2020-02-28T23:32:26Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

@@ -103,7 +105,8 @@ InternalTopologyBuilder builder() {
        return builder;
    }

-    void handleRebalanceStart(final Set<String> subscribedTopics) {
+    // visible for testing
+    public void handleRebalanceStart(final Set<String> subscribedTopics) {


To be honest, this also bugs me :)

What if we move org.apache.kafka.streams.tests.StreamsUpgradeTest.FutureStreamsPartitionAssignor into package org.apache.kafka.streams.processor.internals? Then package-private would continue to work fine.

My personal bias is that anytime you see // visible for testing, you're looking at a potential bug, because nothing prevents that comment from becoming false, and in fact, I have found such comments in our code base that were already false. Either this method is part of the public contract of the class, or it's not.

That said, if you really prefer it this way, we can keep it (although, I might ask you to review a clean-up PR later ;) )

vvcephei · 2020-02-28T23:37:26Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

    /**
     * Returns ids of tasks whose states are kept on the local storage. This includes active, standby, and previously
     * assigned but not yet cleaned up tasks
     */
-    public Set<TaskId> tasksOnLocalStorage() {
+    Set<TaskId> tasksOnLocalStorage() {


This method could actually become private, except for a single test. I'm wondering if we can port that test to use getTaskLags instead. Aside from letting us make this private, that would probably improve our testing coverage, since I suppose that test was intended to unit test this class, meaning it should be testing this class's public API, not internal methods.

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

vvcephei · 2020-02-28T23:42:25Z

.../src/main/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfo.java

+        final Set<TaskId> standbyTasks = new HashSet<>();
+
+        for (final Map.Entry<TaskId, Integer> taskOffsetSum : taskOffsetSums.entrySet()) {
+            if (taskOffsetSum.getValue() == -1) {


Oh, actually, here's the reason a constant sentinel is nice, but we didn't actually use it!

vvcephei

Thanks for the update, @ableegoldman ! I've completed my pass and left a few more comments.

vvcephei · 2020-03-02T16:10:33Z

.../src/main/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfo.java

+        }).collect(Collectors.toList()));
+    }
+
+    private static void setPrevAndStandbySetsFromParsedTaskOffsetSumMap(final SubscriptionInfoData data,


Since we invoke this method from a number of places, should we add a flag and make sure it only sets the state once?

I realized the other callers actually don't need to call this at all, so now these only get called from the constructor

vvcephei · 2020-03-02T20:42:35Z

...ava/org/apache/kafka/streams/processor/internals/assignment/LegacySubscriptionInfoSerde.java


    public LegacySubscriptionInfoSerde(final int version,
                                       final int latestSupportedVersion,
                                       final UUID processId,
                                       final Set<TaskId> prevTasks,
                                       final Set<TaskId> standbyTasks,
-                                       final String userEndPoint) {
+                                       final String userEndPoint,
+                                       final Map<TaskId, Integer> taskLags) {


I'm probably missing the point here, but I think the idea of this class is that it should not change in response to changes in SubscriptionInfo. I think it's supposed to be a stand-in for the behavior of older Streams versions when the cluster has old and new members running at the same time. Maybe it doesn't really work that way, though, in which case, I might doubt the utility of this class at all, and instead recommend relying on the system tests. Can you comment?

.../test/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfoTest.java

vvcephei · 2020-03-02T20:45:14Z

.../test/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfoTest.java

+            ACTIVE_TASKS,
+            STANDBY_TASKS,
+            "localhost:80",
+            null


Yes, it should... I asked a question about this on that class itself. It seems like you shouldn't have had to modify it at all.

streams/src/test/java/org/apache/kafka/streams/tests/StreamsUpgradeTest.java

vvcephei · 2020-03-02T20:49:32Z

streams/src/test/java/org/apache/kafka/streams/tests/StreamsUpgradeTest.java

-                               final Set<TaskId> standbyTasks,
-                               final String userEndPoint) {
+                               final String userEndPoint,
+                               final Map<TaskId, Integer> taskLags) {


I'm not sure this should be necessary either. IIUC, the "future" subscription info isn't supposed to really be a descendant of the current protocol, just a stand-in for some protocol version bigger than ours, in which case all that really matters is the version number. Its role is just to join the cluster and get downgraded to the "latest" version, in which case it should be able to defer to SubscriptionInfo.

Co-Authored-By: Bruno Cadonna <[email protected]>

…fset

vvcephei

Thanks @ableegoldman , just a couple of final thoughts, then I think it's good to go.

vvcephei · 2020-03-05T20:32:28Z

.../src/main/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfo.java

+        }).collect(Collectors.toList()));
+    }
+
+    private static void setPrevAndStandbySetsFromParsedTaskOffsetSumMap(final SubscriptionInfoData data,


vvcephei · 2020-03-05T20:38:16Z

streams/src/main/resources/common/message/SubscriptionInfo.json

+    {
+      "name": "taskOffsetSums",
+      "versions": "7+",
+      "type": "[]TaskOffsetSum"


Do we want to try for a slightly more efficient encoding here, of Map[topicGroupId -> Map[partition -> offsetSum]], or do you think this is fine for now?

I was vaguely hoping that we'd add the ability to use nested structs in the near future and could move to use the TaskId struct here, but I suppose we may as well take advantage of this to go for the more efficient encoding when possible. Will do

vvcephei · 2020-03-05T20:50:15Z

.../test/java/org/apache/kafka/streams/processor/internals/assignment/SubscriptionInfoTest.java

    private static ByteBuffer encodeFutureVersion() {
        final ByteBuffer buf = ByteBuffer.allocate(4 /* used version */
-                                                       + 4 /* supported version */);


looks like maybe we have duelling code formatters. Not sure which choice makes more sense.

Sounds like you think the other choice makes more sense 😜 Honestly neither of them looks that good to me but I'll revert the reformatting

vvcephei · 2020-03-05T20:54:11Z

streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java

+
+        for (final TaskId id : tasksOnLocalStorage()) {
+            if (isRunning(id)) {
+                taskOffsetSums.put(id, Task.LATEST_OFFSET);


I'm just a tiny bit uncomfortable with re-using that sentinel, because the correctness of our logic depends on the active sentinel being less than the standby sentinel, so it must be less than zero. Do we have a reason to believe that Task.LATEST_OFFSET would never change to a number that would spoil us here, such as zero?

I actually changed this based on working on the next PR, as Task#changelogOffsets uses this sentinel for exactly the same thing, ie an indicator that the task is running (and active). This is only used in computing the lag info for KIP-535, which has a similar desire to differentiate between a running task that is completely caught up and any other. So, I can't imagine this being changed -- but I can add a comment to the constant explaining it should always be negative (not sure why it's "-2" specifically, as opposed to "-1", do you?)

vvcephei · 2020-03-06T00:25:57Z

test this please

ableegoldman · 2020-03-06T04:33:06Z

Wow, the tests actually ran AND they all passed on the first try?? Amazing

ableegoldman commented Feb 19, 2020

View reviewed changes

ableegoldman force-pushed the KIP-441-protocol-change branch from 0e4db53 to b51b63b Compare February 19, 2020 04:51

ableegoldman commented Feb 19, 2020

View reviewed changes

vvcephei self-requested a review February 19, 2020 19:00

vvcephei added the streams label Feb 19, 2020

vvcephei assigned vvcephei and unassigned vvcephei Feb 19, 2020

cadonna reviewed Feb 21, 2020

View reviewed changes

ableegoldman force-pushed the KIP-441-protocol-change branch 3 times, most recently from 5b5af30 to eae9f3b Compare February 28, 2020 23:38

vvcephei reviewed Feb 28, 2020

View reviewed changes

vvcephei reviewed Mar 2, 2020

View reviewed changes

ableegoldman and others added 16 commits March 4, 2020 15:25

use taskLags to infer prev/standby tasks if necessary

5a67b59

fixing up tests

8b45f78

cleaned up task manager and subscription code, compiles

b4d91a8

bump assignment version handling

2aabf29

add subscription info tests

f510057

fix assignmentinfo test

c063dc4

fixing up tests

424af9a

null check task

cac1f39

github reivew: main code

72ecf58

remove 'paior' from name

c2da9f7

Github suggestions

a37a8eb

Co-Authored-By: Bruno Cadonna <[email protected]>

use offset sum instead of lag

40e20e3

fix names

7d8bdc2

github review

e4e941f

bump version in VP system test

0fe088a

review

361e449

checkstyle

951c245

ableegoldman force-pushed the KIP-441-protocol-change branch from c0001fc to 951c245 Compare March 4, 2020 20:25

ableegoldman added 3 commits March 4, 2020 15:35

only encode RUNNING active tasks as -1

41d70a2

fix javadocs

aeebfa8

reuse existing Task.LATEST_OFFSET for active running task sentinel of…

16be046

…fset

vvcephei reviewed Mar 5, 2020

View reviewed changes

ableegoldman added 4 commits March 5, 2020 14:24

github review

cce6847

encode task id offset mapcompactly

36b94cc

fx NPE

f0f2022

fix VP upgrade test

9c1849c

vvcephei merged commit 674360f into apache:trunk Mar 6, 2020

mjsax added the kip Requires or implements a KIP label Jun 12, 2020

ableegoldman deleted the KIP-441-protocol-change branch June 26, 2020 22:37

	taskLags.put(id, 0);
	taskLags.put(id, STANDBY_TASK_SENTINEL_LAG);

KAFKA-6145: Pt 1. Bump protocol version and encode task lag map #8121

KAFKA-6145: Pt 1. Bump protocol version and encode task lag map #8121

Conversation

ableegoldman commented Feb 15, 2020

ableegoldman commented Feb 19, 2020

Choose a reason for hiding this comment

ableegoldman Feb 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cadonna Feb 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei Mar 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvcephei commented Mar 6, 2020

ableegoldman commented Mar 6, 2020

ableegoldman Feb 19, 2020 •

edited

Loading

cadonna Feb 21, 2020 •

edited

Loading

vvcephei Mar 2, 2020 •

edited

Loading